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(57) Abstract 



A CPU has an execution unit for operating on data under instruction control. A cache and a buffer register are coupled in parallel to 
an input of the execution unit. The buffer register supplies an information item, such as data or an instruction, to the execution unit upon 
the cache having completed a refill process. 
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Extra register minimizes CPU idle cycles during cache refill. 



FIELD OF THE INVENTION 

The invention relates to an electronic circuit comprising a CPU with a cache. 
The invention also relates to a method of supplying an information item, such as data or an 
instruction, to an execution unit of a CPU. 

5 

BACKGROUND ART ' 

A CPU typically has one or more cache memories arranged between the data 
and instruction inputs of its execution unit on the one hand and the port for connection to main 
memory. The caches compensate for the difference in speed between the processing in the 

10 CPU and the fetching of data and instructions from main memory. The successful operation of 
the cache relies on the locality principle: program references to memory tend to be clustered in 
time and in logical space. Temporal clustering relates to the tendency to reference the same 
address more than once within a specific period of time. Spatial clustering relates to the 
tendency to fetch data or instructions from logically consecutive memory addresses. The data 

1 5 and instructions in the main memory are mapped into the cache in blocks of logically coherent 
addresses. Below, the term "information item" is used to refer to either data or an instruction 
within this context. 

A cache read miss occurs when the CPU requests an information item that is 
not present in its cache. The cache has thereupon to retrieve the appropriate block from the 

20 irnain memory or the secondary cache and store it. During this cache refill, the execution unit is 
stalled. Various techniques are in use to minimize the number of clock cycles that the . 
execution unit has to idle as a result of a cache refill. 

For example, European patent application 0 543 487 Al discusses the early- 
restart technique. As soon as the requested item arrives from main memory it is sent to the 

25 execution unit without waiting for completion of the writing of the entire block to the cache. A 
refinement of this early-restart is the out-of-order fetch. The out-of-order fetch lets the main 
memory skip all information items located at addresses logically preceding the requested item 
in the relevant block. The requested item is sent directly to the execution unit upon retrieval 
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while the remainder of the block is being retrieved looping around to fetch the items 
previously skipped. 

European patent application 0 543 487 Al also discusses an alternative 
technique that involves the following steps; If the CPU fetches data during a data cache fill 
5 and the requested data being fetched is part, of the memory block being cuirently filled, the 
data is retrieved and returned to the execution unit simultaneously with its writing into the 
cache, if the data has not been written into the cache. If the data has been written into the 
cache, the data is retrieved and returned to the execution unit at the next read cycle. 

Also see, e.g., "MPS RISC Architecture", Gerry Kane and Joe Heinrich, 

10 Prentice Hall, 1992, notably Chapter 5, page 5-5. In the implementations of MIPS processor 
architectures, e.g., the R2000 and R3000, a typical sequence of events occurring after a cache 
miss are the following. On a cache miss, the processor reads one word from memory and stalls 
while the designated blocks in the cache are refilled. After the refill has been completed, 
missed information items are retrieved from the cache and are supplied to the processor's 

1 5 execution unit to resume processing. For general background information on the MIPS 

architecture, also see, e.g., "Structured Computer Organization*', A.S. Tanenbaum, Prentice 
Hall International Editions, third edition, 1990, especially pp. 472-487. 

OBJECT OF THE INVENTION 
20 The advantages of early restart are limited if the execution unit processes the 

requested item faster than the cache can complete the refill. In the latter case, the execution 

has to idle after processing the item that was received directly until the cache has been refilled. 

The alternative technique in the prior art reference discussed above addresses 

the problem of reducing the number of idle cycles of the execution unit while the cache is 
25 being refilled. This prior art reference does not address the problem of reducing the number of 

idle cycles when the refill has been, or nearly has been, completed. It is an object of the 

invention to increase performance of the processor by reducing the number of idle cycles 

substantially near completion of the cache refill. 

30 SUMMARY OF THE INVENTION ' ' 

To this end, the invention provides an electronic circuit comprising a CPU, an 
input for receipt of an information item, and a cache between the input and an execution unit 
of the CPU. The execution unit is operative to process the item. The circuit further comprises a 
buffer between the input and the execution unit, and a controller connected to the buffer. The 
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controller controls the storing of the information item in the buffer and the supply of the item 
to the execution unit substantially near completion of a cache refill. . 

The inventor proposes to. use a temporary buffer in order to prevent the CPU 
from idling at least during the step wherein 1 an item is being retrieved from the cache upon 
5 completion of the refill. The item is provided from the buffer instead of from the cache near 
completion of the refill. In this way, at least one clock cycle is saved per cache miss, since the 
buffer register is not address-controlled like a cache. 

The circuit of the invention can use the buffer in combination with a main 
memory capable of early-restart and out-of-order fetch as mentioned above. The early 
10 restart/out-of-order fetch allows reducing the number of CPU idling cycles preceding the 
, cache refill, and the buffer register in the invention reduces the number of CPU idling cycles 
after the cache has been refilled or has nearly completed refilling. . 

BRIEF DESCRIPTION OF THE DRAWINGS 
1 5 The invention is explained below in further detail and by way of example with 

reference to the accompanying drawings, wherein: . 

Fig.l is a block diagram of a circuit of the invention; and 

Fig.2 is a diagram illustrating part of a cache controller's finite state machine. 

20 PREFERRED EMBODIMENTS 

Fig.l is a functional block diagram with main components of an electronic 
circuit 100 according to the invention. . Circuit 100 comprises a CPU 102, a bus controller 104 
and a main memory . 106 interconnected via a bus 108. CPU 102 has a bus interface 1 10, an 
execution, unit 112; an instruction cache 1 14, a data cache 116, an instruction cache controller 

25 .11 8-and a data cache controller 120 for control of caches .114 and 1 1.6, respectively. CPU 102 
-further comprises a buffer register 122. In this example, buffer 122 and data cache 116 are 
arranged in parallel between, controller 120 and unit 1 12. Buffer 122 and cache 1 16 are 
coupled to a data input of unit 1 12 via a multiplexer 124 that ^controlled by-controller 120. 

Controller 120 controls buffer 122 to store data so that it can be supplied to 

30 execution unit 1 12 substantially at or near completion of a refill of cache 110 upon a read 
miss. In this manner, at least one clock cycle is saved during which unit 1 12 needs not idle. 
Buffer 122 supplies one or more data items to unit 1 12, e.g., when cache 1 16 is preparing for 
the first read cycle after the refill. The cache read cycle and the supply via buffer register 122 
thus overlap in time. 
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Buffer 122 stores at least the data that was requested by CPU 102 and that 
caused the cache read miss. Buffer 122 may also store further data items at the addresses 
logically successive to the address of the requested data. Cache 116 also stores the requested 
data and the logically next data in order ; to comply with the cache's locality principle 
5 mentioned above. ■ 

* The instruction path between cache 114 and unit 112 contains a similar buffer 
126 and multiplexer 128 arrangement as the data path between cache 1 1 6 and unit 112 and it is 
controlled by instruction cache controller 1 18 in a similar way. Typically, the sequence of 
instructions during processing is known in advance. Conditional branching operations may 

1 0 occur in some software applications, but between two branching instructions the instruction 
stream is linear and known. If the buffer is a FIFO and stores two or more instructions, the 
supply of sequential instructions to unit 1 12 could start from buffer 126 before completion of 
the instruction cache refill and the start of the cache read cycle, or even substantially well 
before the completion of the instruction cache refill. To this end, controller 1 18 has to keep 

15 track of extent to which the address block has been mapped from main memory 106 to cache 
114. 

Fig.2 is a diagram of a part of a finite state machine 200 in cache controller 120. 
State machine 200 is explained in the following. 

A stall cycle in a pipelines processor such as the MIPS 3000 is a cycle wherein 

20 the CPU waits for some event without doing useful work. For background information on the 
pipeline in the MIPS 3000, see, e.g., "MIPS RISC Architecture", Gerry Kane and Joe 
Heinrich, Prentice Hall, 1992, especially Chapter 1. The invention reduces the number of stall 
cycles upon a cache miss. In the MIPS 3000 the sequence of events upon a cache miss is the 
following. First, the designated blocks in the cache are refilled. Then, after the refill is 

25 completed, missed data is read from the cache and supplied to the execution unit. During both 
the refill phase and the cache read phase, the execution units is stalled. In the invention, 
however, execution unit 1 12 is not stalled during the cache read phase. Read data is stored 
temporarily in buffer 122 during the refill and is supplied to execution unit from buffer 122 
and not from cache 1 16. In this manner at least one clock cycle is saved per individual cache 

30 miss. This is illustrated by state machine 200 that comprises the following transitions. 

Transition 202 from "read" to "refill" corresponds to a cache miss. Transition 204 from "read" 
to "non-cache" corresponds to a non-cacheable access to main memory (or secondary cache) 
106. Transition 206 between "refill" and "wait" corresponds to the -refill being completed and 
the execution unit not being ready yet. Transition 4 between "non-cache" and "wait" 
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corresponds to a non-cacheable word being fetched from main memory 106, and execution 



corresponds to the refill being completed and execution unit 1 12 being ready. Transition 212 
between "non-cache" and "read" corresponds. to a non-cacheable word being fetched from 
5 main memory and execution unit 112 being ready. Transition 214 between "wait" and "read" 
corresponds to execution unit 1 12 being ready to accept the data. During the "refill" and "non- 
cache" states CPU. 102 places the requested data in buffer 122 and communicates the data to 
execution unit 1 1 2 during transitions 210,212 and 214. 



unit 1 12 not being ready yet to accept it. Transition 210 between "refill" and "read 
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CLAIMS: 



1. An electronic circuit comprising a CPU having: 

- an input for receipt of an information item (110); 

- an execution unit (112) for processing the information item; 

- a cache (1 14 or 1 1 6) between the input and the execution unit; 

5 - a buffer (126 or 122) between the input and the execution unit for storing the item; and 

- a buffer controller connected to the buffer for control of storing of the item in the buffer and 
of supplying the item to the execution unit substantially near completion of a cache refill. 

2. The circuit of claim 1, wherein the buffer controller comprises a cache 
10 controller (118 or 120). 

3. The circuit of claim 1 , wherein the cache comprises a data cache (116). 

4. The circuit of claim 1 , wherein the cache comprises an instruction cache (114). 

15 

5. An electronic circuit comprising a CPU having: 

- a data input (110) for receipt of data; 

- an instruction input (110) for receipt of an instruction; 

- an execution unit (112) for processing the data under control of the instruction; 
20 - a data cache (116) between the data input and the execution unit; 

- a data buffer (122) between the data input and the execution unit for storing the data; 

- a data buffer controller (120) connected to the data buffer for control of storing of the data in 
the data buffer and of supplying the data to the execution unit substantially near completion of 
a data cache refill; 

25 - an instruction cache (1 14) between the instruction input and the execution unit; 

- an instruction buffer (126) between the instruction input and the execution unit for storing 
the instruction; and 
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- an instruction buffer controller (118) connected to the instruction buffer for control of storing 
of the instruction in the instruction buffer and of supplying the instruction to the execution unit 
substantially near completion of a instruction cache refill. 

6. A method of information processing with an electronic circuit that has an 

execution unit for processing an information item and a cache (1 14 or 116), the method 
comprising storing the item in a buffer and supplying the item to the execution unit 
substantially near completion of a cache refill. 
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